Identification of novel STAT3 inhibitors for liver fibrosis, using pharmacophore-based virtual screening, molecular docking, and biomolecular dynamics simulations

The signal transducer and activator of transcription 3 (STAT3) plays a fundamental role in the growth and regulation of cellular life. Activation and over-expression of STAT3 have been implicated in many cancers including solid blood tumors and other diseases such as liver fibrosis and rheumatoid arthritis. Therefore, STAT3 inhibitors are be coming a growing and interesting area of pharmacological research. Consequently, the aim of this study is to design novel inhibitors of STAT3-SH3 computationally for the reduction of liver fibrosis. Herein, we performed Pharmacophore-based virtual screening of databases including more than 19,481 commercially available compounds and in-house compounds. The hits obtained from virtual screening were further docked with the STAT3 receptor. The hits were further ranked on the basis of docking score and binding interaction with the active site of STAT3. ADMET properties of the screened compounds were calculated and filtered based on drug-likeness criteria. Finally, the top five drug-like hit compounds were selected and subjected to molecular dynamic simulation. The stability of each drug-like hit in complex with STAT3 was determined by computing their RMSD, RMSF, Rg, and DCCM analyses. Among all the compounds Sa32 revealed a good docking score, interactions, and stability during the entire simulation procedure. As compared to the Reference compound, the drug-like hit compound Sa32 showed good docking scores, interaction, stability, and binding energy. Therefore, we identified Sa32 as the best small molecule potent inhibitor for STAT3 that will be helpful in the future for the treatment of liver fibrosis.


Pharmacophore modeling and virtual screening
A Pharmacophore model was developed in the MOE (Molecular Operating Environment) software 11 for the virtual screening of synthetic and commercially available compounds databases.The model was then validated and the validated Pharmacophore model as 3D query was utilized for the virtual screening.To identify and retrieve the potent anti-STAT3 compounds, different databases including in-house, ZINC, Phytochemicals, and South African databases were screened 12 .The screened hits best fitted on the applied Pharmacophore were further filtered by ADMET properties to short-list compounds having drug-likeness.In order to determine the pharmacokinetic properties of the retrieved compounds, the Lipinski's rule of five (RoF) was applied.Determination of drug-likeness property is particularly used to demonstrate new lead hits by screening of compound databases 13 .

Molecular docking of hit compounds and STAT3
The possible interaction of protein-ligands complex is explored by molecular docking study.Molecular docking helps topredict intermolecular framework formed among small molecule and protein, suggesting interaction modes leading to protein inhibition.In this study, all the drug-like retrieved hits were docked into the active site of STAT3.For docking purpose, the docking protocol implemented in MOE 2016 software was used 14 .For docking of STAT3, the structure of its SH2 domain (PDBID:1BG1) was taken from protein data bank https:// www.www.nature.com/scientificreports/rcsb.org/ 15 This SH2 domain (comprised of 499-688 amino acid sequence) of STAT3 is phosphorylated 16 .STAT3 receptor and its reference i.e., standard inhibitor (Sorafenib) was initially Protonated and subjected to energy minimization with default parameters (gradient: 0.05, Force Field: MMFF94x) of MOE 2016.08 software 17 .The energy converged system was subjected to docking.The result of docking was analyzed on the basis of protein/ compounds interactions and docking scores.In order to validate the stability and interaction of drug-like hits in the binding site of STAT3-SH2 domain, the complexes were subjected to molecular dynamics simulations.

Molecular dynamics simulations
MD simulation of the best five docking score compounds and the standard compound (Sorafenib) were carried out to unveil the stability of the protein-ligand complexes.Molecular dynamic simulations and analysis for each complex was performed in AMBER 20 18 using the force field (ff14SB).During the MD protocol, each system was neutralized by utilizing the LEAP module counter ions such as Na + and Cl − .TIP3P water model was filled in octahedral box with 10 Å parameter to solvateeach system 14 .For long-range electrostatic interactions a distance of 10 Å was used.Additionally, the Particle Mesh Ewald (PME) algorithm was used to deal with long-range electrostatic interactions.By using the SHAKE option and the Particle Mesh Ewald (PME) method, covalent bonds involving hydrogen atoms were optimized in each complex system 16 .The temperature was controlled by Langevin dynamics.Finally, 100 ns MD simulations (two replicas) for each complex were performed on the GPU using CUDA version of PMEMD.For the analyses of MD trajectory, Amber 20's CPPTRAJ module was used.

ADME predictions
Evaluation of absorption, distribution, metabolism, and excretion (ADME) are the main criteria to recommend a molecule as a drug-like candidate.Hence, computational approaches are utilized to predict the ADME properties of chemical molecules and advocate their drug-likeness.In this study, we used SwissADME http:// www.swiss adme.ch online server 19,20 and assessed the ADME properties such as GIT absorption, bioavailability, and solubility profile of the selected compounds 21 .

Binding free energy calculation
The molecular Mechanic/Poisson-Boltzmann Surface Area (MM-PBSA) approach 22 is an efficient and reliable method to model molecular recognition, such as for protein-ligand binding interactions.To estimate the strength of binding of STAT3 in complex with either the reference compound or any of the candidate hits, the binding free energy was computed by employing the MMPBSA.pyscript.The MMPBSA was computed for the last 500 frames (10 ns) of each simulated complex.The below equation was used to calculate the free energy 18 The equations describe G bind shows total binding free energies, ΔG complex use for complex free energies, and the remaining terms stand for the corresponding free energies of the receptor protein and ligand.

Pharmacophore model generation
Ligand-based Pharmacophore model was generated.Sorafenib was selected as a reference for Pharmacophore model generation.Pharmacophorewas generated from a total of seven (7) features.Thus, the resulted Pharmacophore was comprised of two hydrogen bond acceptor (HBA), two hydrogen bond donor (HBD), one aromatic (Aro) and two hydrophobic (HY) features (Fig. 1).Finally, in-house, ZINC, phytochemicals and South African compound libraries were screened based on the generated Pharmacophore.As a result, many compounds were found that interacted more strongly with the targeted compound than the reference compound.

Pharmacophore model validation
Internal database was used in this study as test set forPharmacophore model validation.The test set contained 852 inactive compounds for STAT3 and the rest of 48 molecules were known inhibitors.The validated model has the ability to differentiate between the active and inactive molecules.This model was used to investigateand carry out virtual screening.Many important parameters were calculated like total hits (Ht), active hits (Ha), % yield of actives, % ratio of actives, enrichment factor (E), and goodness-of-hit score (GH) which are placed in (Table 1).GH Score of 0.7-0.8indicates a validated model 23 .The model was used for further virtual screening purpose.The higher the E value, the better the model's ability to identify active compounds.With an E-value of 15, the model identified 40 active hits out of 900 compounds screened, indicating that the model was able to differentiate active frominactive molecules.A Gunner Henry's score of 0.7-0.8preferred as a good model.So the goodness score for Pharmacophore model was 0.79.The validation of our generated Pharmacophore suggested that the resultant Pharmacophore was good enough to be used for virtual screening of different databases against the identification of STAT3 candidate hit compounds.

Virtual screening
The validated model was utilized to screen in-house database (1700 molecules), zinc database (12,000 compounds), phytochemicals (5000 compounds) and South African database (1000 compounds).The Pharmacophore model retrieved340 compounds as the best fitted compounds on the Pharmacophore features against the STAT3 candidate hits molecules.

Molecular docking
The ADMET evaluation of the Pharmacophore-retrieved hits filtered a total of 278 hits as drug-like molecules which were further docked in the active site ofSTAT3 which is comprised of Arg609, Arg595, Lys591, Arg609, Ser611, Glu612, Ser613, Gln633, Ile634, Gln635, Ser636, Val637, Gln638, Met586, Gly587, Phe588, Ilu589, and Ser590 residues.The top 25 ligands of STAT3 were selected according to the S-score and interaction type.Table 2 suggests the S-Score and interacting residues of the top 25 hits with STAT3. Figure 2 represents the interactions of best hits and the reference compound with STAT3 receptor.
Among the 25 docked compounds, 5459840, SANC00347, ZINC47009207 and Sa32 were the most potent with docking score − 11.2812, − 10.2663, − 9.2654 and − 9.1425, respectively.Most of the compounds showed the most accurate binding and interaction with the receptors compared to reference compound.The docking index for Sorafenib was − 7.2334.

ADMET properties determination
Table 3 shows the drug likeness of the molecules 5459840; ZINC47009207, Sa32 and SANC00347 identified by online ADME/Tox tool of Swiss-ADME using the SMILES nomenclature.To show the ADME/Tox profile, we decided on maximum fundamental ADME/Tox functionality from online server.Table 3 displays the pharmacokinetic profiles of the selected compounds.Drug-likeness of a compound is strongly stimulated with the aid of 2 properties of water solubility and hydrophobicity (predicted using the ALI log S, log S, and ESOL methods) 24 .Figure 3 shows the drug-like properties (molecular weight etc.) obtained from ADME/Tox tool.Orally active drugs must follow Lipinski rules five.

Molecular dynamic simulation
To investigate the structural stability of the docked complexes, 100 ns molecular dynamics simulations were executed to determine the stability of the top lead compounds in complex with receptor.MD simulations were performed by using Amber program software.Molecular dynamics (MD) simulations of the top five (05) hits with good interactions, binding energies, best docking scores, and have satisfied ADMET properties compared to the reference molecule was carried out.While, stability of the complexes ZINC47009207 (green) compound, Sa32 complex (Blue), and SANC00347 (sky-blue) were evaluate based on root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg) and DCCM.Moreover, to recognize the structural variability in protein over a time of 100 ns, the simulated complex with existence of lead compounds at particular trajectory frames that is frame 1, 20, 40, 60, 80, and 100 were analyzed and are shown in Figs. 4, 5, 6 and 7.

RMSD analysis
The predicted complexes stability was further checkedby MD simulations.After 100 ns MD simulations, the RMSD of the backbone of STAT3 and the ligands at 300 K was plotted against time (ns).This was shown in Fig. 4, the RMSDs of Sa32 and ZINC47009207 were found to be comparatively firm and stable at around 2.7 and 3 Å, respectively.Sa32 wasstable from the beginning in replica 1 then there were some fluctuations from 40 to 60 ns.The Sa32 in replica2 revealed minor deviations from 5-10 to 70-85 ns but overall the complex RMSD was stable during the 100 ns MD run.ZINC47009207 in replica 1has been stable since beginning with some unstable behavior from the 60 ns to 70 ns as compared to Ref complex.The ZINC47009207in replica2 revealed an unstable behavior during 21-70 ns but then the system gain stability and remained stable till the end of MD run.The average RMSD value of the reference compound was found as 3.1 Å in replica 1.The RMSD of Ref compound in replica2 was stable during the first 25 ns and then the RMSD increases from 26 ns to 50 ns after that the system gain stability and remained stable till the 100 ns.The compound545984 in replica 1around 20 ns Vol:.( 1234567890 indicate stable during the first 50 ns after that a slight increase in RMSD was observed and fluctuations were observed during 51-82 ns after that, the system gain stability and remained stable.The compound SANC00347 shows stability from start to 20 ns then gradually shows fluctuation from 21 to 80nsin replica 1.Then from 85 to 100 ns they show stability.The RMSD value of SANC00347 was found as 4.5 Å during MD simulation.The RMSD of SANC00347 in replica2 indicates a similar pattern to that of the replica1.The RMSD was stable during the first 20 ns but major deviations were observed from 20 to70 ns and then get stability and remained stable during the last 30 ns.The RMSD plots of replica 2 are shown in Fig. S4 (Supplementary).Except the compound SANC00347 the RMSD of all other compounds indicated high stability during the entire 100 ns MD simulation.

RMSF analysis
The RMSF is a significant factor that produces data regarding the structural flexibility of each and every residue i-e remains in the system.In replica 1 the RMSF values of the Ref, 5459840, ZINC47009207, Sa32 and SANC00347 complexes were calculated (Fig. 5).In RMSF we found that fluctuation was found in the region including 18-21(THR516, LYS517, ARG518, GLY519) 39-41(VAL537, ASN538, TYR539) 43-51(SER540, GLY541, CYS542, GLN543, ILE544, THR545, TRP546, ALA547, LY548, PHE549, CYS550, LYS551) 122-127 (THR620, PHE621, THR622, TRP623, VAL624, GLU625) 160-164(LYS658, ILE659, MET660, ASP661, ALA662).The fluctuating residues were not present in the binding site and the active site residues showed stable behavior which indicates strong binding of compounds with the receptor.The RMSF value of SANC00347 was very high up to 12 Å while the RMSF value of all other complexes was less than 4 Å.RMSD analysis was in agreement with the RMSF analysis which indicated that among all the complexes SANC00347 was unstable as compared to the control and other complexes.In replica 2 a similar pattern of RMSF was observed as the replica 1.In all the systems the RMSF was found less than 4 Å except the SANC00347.The RMSF plot of SANC00347 was observed as 13 Å.The residues such as 25-40 and 125-135 in the complexes 5459840 and the SANC00347 revealed high fluctuations during the MD simulation.The residues of Sa32 and ZINC47009207 revealed minor fluctuations during MD simulation.The RMSF plots of all the systems in replica 2 are shown in Fig. S5 (Supplementary).

Radius of gyration (Rg)
STAT3 protein compactness with their complexes was evaluated by utilizing the radius of gyration (Rg).The analysis of Rg confirmed that all complexes of STAT3 i.e., Sa32 and ZINC47009207 while Fig. 6

DCCM analysis
The aim of the DCCM analysis is to study the changes in the binding pocket conformation of STAT3 protein when interacting with various ligands.This was accomplished through a post Molecular Docking analysis of the www.nature.com/scientificreports/backbone of Ca during MD simulations lasting 100 ns, to observe any fluctuations and correlated motions.To determine binding affinity of selected or targeted ligands in the active site of the correlation and anti-correlation motions of STAT3.The dynamics matrix of cross-correlation plots of 100 ns MD simulations was plotted for every complex.The analysis of DCCM was to evaluate correlation between residuals and this was performed to elucidate the motion correlation of the residuals.Movements of amino acids are in parallel direction, indicating strongly correlated movements called positive correlation.On the other side, if the amino acids movements are anti-parallel then this shows movement of anti-correlation.Negative & positive correlations were calculated by DCCM analysis and represented as the cyan to dark blue region and the red to green region as shown in Fig. 7A-E

Free energy binding
The binding energy of ligand-STAT3 complexes was calculated using MMPBSA methods.However, van der Waals and electrostatic interactions and non-polar solvation energy negatively contribute to the total interaction energy while only polar solvation energy positively contributes to total free binding energy.In terms of negative contribution, van der Waals interaction gives much larger contribution than electrostatic interactions for all the cases.The non-polar free energy contributes relatively less as compared to the total binding energy.This indicates that non-polar solvation energy, van der Waals and electrostatic interaction together contribute to the STAT3 polyphenol complex stability 25 .MMPBSA was calculated for the last 10 ns from the total of 100 ns MD trajectory.The binding free energies ΔG Bind of complexes Ref, 5459840, ZINC47009207, Sa32 and SANC00347 calculated during the 100 ns simulation were found to be − 9.8027 kcal/mole, − 11.5849 kcal/mole, − 10.6547 kcal/mole and − 24.0920 kcal/mole respectively.The details of MMPBSA calculation of the complexes are summarized in Table 4.In Table 4 shown the calculated values of the molecular mechanic's VDW, EFL, EPB, ENPOLAR and EDIPER under ΔPBSA between the protein and ligand during the simulation were evaluated from the MM-PBSA calculation.The van der Waals VDW interactions of ligands − 29.1030 kcal/mole (Reference), − 29.6634 kcal/ mole (5459840), − 43.4243 kcal/mole (ZINC47009207), − 34.0802 kcal/mole (Sa32) and − 35.0421 kcal/mole (SANC00347), all the complex systems showed the negative values of VDWso all complexes illustrated sufficient hydrophobic interactions.The results of the free energy calculation showed that the complex having best binding energy, also best RMSD and best binding affinity toward receptor.As compared to Ref compound binding energy of SANC00347 was not good enough.

Discussion
Liver fibrosis is an initial stage of cirrhosis and results from abundant chronic liver disorders and a wide process that presents with numerous depositions of extracellular matrix proteins that change the liver architecture and functionality resulting in cirrhosis and failed organ 26 .Liver disorders are responsible for approximately 2million fatalities each year worldwide, with 1million of these deaths attributable to cirrhosis severity.Liver fibrosis is now at the top of list among ten causes of death globally 27 .Therefore, some STAT3 inhibitors exist such as There are some STAT3 SH2 domain inhibitors that are Shikonin a natural naphthoquinone derivative that has been recognized as STAT3 inhibitory potency.Napabucasin has been also identified as a STAT3 inhibitor which inhibited STAT3 by binding to its SH2 domain.Nifuroxazide and ODZ10117 have a greater inhibition on STAT3 activation via directly blocking the SH2 domain of STAT3 and blocking the transcription activity 29 .
This study has shown few STAT3 inhibitors that help to overcome STAT3 activity as well as anticancer effects.According to Tomohiro, Von Manstein, and Groner there are also some disadvantages, such as dose-limiting side  effects, multidrug resistance, and extreme side effects.The result is a timetable for the development of essentially new STAT3 inhibitors with improved viability 30 .
The overall analysis of novel STAT3 inhibitors from different Databases such as In-house (Sa32), Zinc(ZINC47009207), Phytochemicals(5459840) and South African (SANC00347) were carried out.The docking interaction of compounds, RMSD, RMSF, DCCM, Rg, and MMPBSA based analysis are discussed below.
From previous study of docking results shows that N4 (inhibitor) which are STAT3 inhibitor interacted with STAT3-SH2 at multiple amino acid residues some are following LYS591, SER636, VAL637, AND GLU638 31 STAT3 inhibitor i-e S31-201with compounds which improved bioavailability and anti-cancer properties.Their mechanism of action depends upon interaction with STAT3-SH2 and phosphor tyrosine 705 site.The paining with active dimmer from tyrosine peptide is represented together with inhibitor S31-201 from H-bind and residues including LYS591, SER611, SER613 and ARG609 32 .
In current study Sorafenib dock as a reference drug of STAT3 and Sorafenib interact with STAT3-SH2 with some common residues same as previous study that is shown in Fig. 2 they also show structures similarity and binding interaction.And South African compounds SANC00347 show some similarity with S31-201 inhibitor with STAT3-SH2 interactions as shown in Fig. 2 S31-201 and SANC00347 show interaction on same active site residue.
RMSD provides detailed information about protein and their structure predictions.In previous studies RMSD of STAT3 inhibitor (−)-Epigallocatechingallate-complex, Saikosaponin D complex, Kaempferol-3-O-rutinoside complexes and Ginsenoside RK1 and Picroside.On the other hand, the bound state of the protein can stabilize between 50 and 100 ns.For the complex the RMSD was found to exceed 1.5 nm, with multiple variations for Ginsenoside RK1 and a decrease in stability for Picroside I within 75 ns.Both compounds showed lower stability and higher RMSD due to multiple amplitudes.A number of STAT3 complexes are observed by evaluating the active behavior of backbone residues and atomic position variations 24 .In the current study RMSD analysis of novel STAT3 inhibitors such as Sa32, ZINC47009207, 545984 and SANC00347 complexes show stability but also fluctuate at some positions as compared to Reference.But Sa32 and ZINC47009207 best RMSD and binding affinity shows in Fig. 4.
The previous RMSF result provides valuable information about the dynamic behavior of the residues.A reduced flexibility of the complexes will occur due to a small change in atomic positional Hicks 24 .
In the current study, RMSF is the fundamental factor that provides data on the structural compliance of each residue in the system.The values of the following that is Ref, 5459840, ZINC47009207, Sa32 and SANC00347 complexes were calculated Fig. 5 in RMSF we found that fluctuation was found in the region including 18-21, 39-41, 43-51, 122-127 and 160-164 respectively.Some hikes present which are not active site residues.
The current STAT3 inhibitors which are analyzed by Rg analysis and as an important residue amplitudes and backbone deviation, more comprehensive check, general compression in all complexes were needed.The analysis of Rg shows that all complexes of STAT3 i.e., Sa32 and ZINC47009207, 5459840 and SANC00347 showed a different pattern of compared to Ref, as shown in Fig. 6.The dynamic behavior of protein-ligand complexes reveal that binding alters stability and residual flexibility, there by inducing therapeutic clouds.
To investigate the structural changes in the binding pockets of STAT3 protein, to determine the interacting effects of selected compounds in the active site of STAT3 on positively correlated and negatively correlated movements.Hence, for each complex system, the dynamic cross-plots of the correlation matrix are plotted for  www.nature.com/scientificreports/ 100 ns MD simulation, which analyzed the regions of negative and positive correlation, such as cyan to dark blue region and red to green, as shown in Fig. 7A-E.MMPBSA Binding Free Energy (Kcal/mol) is calculated for nominated complexes for estimated protein-ligand complexesof 100 ns MD trajectory.The overall results of complexes were compared and on the basis of MMPBSA analysisSa32 hadthe best binding energy which is shown in Table 4. Sa32 is also best in all analyses like RMSD, and Rg indicating the strong binding affinity of the Sa32 compound toward the receptor.

Conclusion
In the current study, novel STAT3 inhibitors were identified with a comprehensive screening method.The Ph4based virtual screening, docking, and MD simulations were carried out.Additionally, 25 complexes were selected based onbest docking scores which were further screened for the drug-likeness.ADMET drug similarity was utilized to filter the properties of pharmacokinetic of the designed ligand-protein complexes.Lastly, 5 molecules were selected having excellent properties as well as stable binding modes.In addition, Sa32 has a valuable and novel STAT3 inhibitor, which will further serve as a reference for the development of new effective STAT3 inhibitors.Sa32 compound is overall showing the best response toward STAT3 inhibition which is confirmed through RMSD, RMSF, RG and DCCM analysis.It is further recommended to perform Invitro study on the final predicted hits to evaluate the inhibitory potential of these compounds.

was found as 16
.9-17.3 Å.The Rg value of ZINC47009207 (replica 1) was found as 17.2-17.4Å.The Rg value of SANC00347 (replica 1) was 17.2-17.5Å.The Rg value of 5459840 (replica 1), and Ref (replica 1) were found to be 17.1-17.3Å and 17.2-17.41Å respectively.Whereas, other 5459840 and SANC00347 complexes were indicated to be less compact over time as compared to Ref.The dynamic behavior of protein-ligand complexes reveal that binding alters stability and remaining flexibility, thereby inducing therapeutic ability.In replica 2 the Rg of the SANC00347 was observed to be 17.1-17.5Å.The Rg of the complex 5459840 was observed to be 17.0-17.4Å.The Rg value of ZINC47009207, Sa32 and Ref was found to be 16.9-17.1,16.6-17.1Å and 16.4-17.1Å respectively.The Rg plots of replica 2 of all the complexes are shown in Fig. S6 (Supplementary).As compared to replica 2 all the complexes were more compact and revealed high stability in replica 1.

Figure 2 .
Figure 2. The binding interactions of compounds 1, 2, 3, 4 and 5.The interacting residues are depicting in blue color.The compounds are coded with pink color (stick model).Bonds are represented by green lines.The bond distances are labeled in green color.Compound's names are written at bottom.

Figure 3 .
Figure 3. SwissADME radar of various bioactive drug likeness molecules, the red areas in discern constitute every attribute of lipophilicity, solubility, molecular weight, and versatility.

Figure 4 .
Figure 4. Root-mean square deviation (RMSD) analysis, black plot in graph shows the reference complex and Red, Green, Blue and Sky blue plotted indicates 5459840, ZINC47009207, Sa32 and SANC00347 (replica 1) respectively.

Figure 5 .
Figure 5. Root-mean-square fluctuation analysis the black line in graph represents reference complex.And the rest of Red, Green, Blue and Sky blue indicate 5459840, ZINC47009207, Sa32 and SANC00347 (replica 1) respectively.

Table 1 .
Pharmacophore model evaluation based on Gunner-Henry (GH) scoring method.

Table 2 .
The docking scores of top 25 ligands and their interactions with SH2 domain of STAT3.

Table 3 .
Drug likeness properties of the nominated compounds.

Table 4 .
MMPBSA free binding energy of selected compounds.